This endpoint will provide you with a list of pages that contain duplicate content from the page specified in the POST request. You will obtain the URL of each page, its IP address, size in bytes, meta tag info, server info, and relevant data.
The returned results are specific to the similarity parameter specified in the POST request. You can set this parameter from 0 to 4.
When set to 0, the API will return pages with content not similar (or minimally similar) to the content of the target page. When set to 4, the API will return pages with content highly similar to the content of the target page.
Instead of ‘login’ and ‘password’ use your credentials from https://app.dataforseo.com/api-dashboard
# Instead of 'login' and 'password' use your credentials from https://app.dataforseo.com/api-dashboard \
login="login" \
password="password" \
cred="$(printf ${login}:${password} | base64)" \
curl --location --request POST "https://api.dataforseo.com/v3/backlinks/content_duplicates/live" \
--header "Authorization: Basic ${cred}" \
--header "Content-Type: application/json" \
--data-raw "[
{
"target": "https://www.marthastewart.com/2226792/how-bathe-your-cat",
"limit": 5,
"similarity": 2
}
]"
<?php
// You can download this file from here https://cdn.dataforseo.com/v3/examples/php/php_RestClient.zip
require('RestClient.php');
$api_url = 'https://api.dataforseo.com/';
// Instead of 'login' and 'password' use your credentials from https://app.dataforseo.com/api-dashboard
$client = new RestClient($api_url, null, 'login', 'password');
$post_array = array();
// simple way to set a task
$post_array[] = array(
"target" => "https://dataforseo.com",
"similarity" => 4,
"limit" => 10
);
try {
// POST /v3/backlinks/content_duplicates/live
$result = $client->post('/v3/backlinks/content_duplicates/live', $post_array);
print_r($result);
// do something with post result
} catch (RestClientException $e) {
echo "n";
print "HTTP code: {$e-=>getHttpCode()}n";
print "Error code: {$e-=>getCode()}n";
print "Message: {$e-=>getMessage()}n";
print $e-=>getTraceAsString();
echo "n";
}
$client = null;
?>
from client import RestClient
# You can download this file from here https://cdn.dataforseo.com/v3/examples/python/python_Client.zip
client = RestClient("login", "password")
post_data = dict()
# simple way to set a task
post_data[len(post_data)] = dict(
target="https://dataforseo.com",
similarity=4,
limit=10
)
# POST /v3/backlinks/content_duplicates/live
response = client.post("/v3/backlinks/content_duplicates/live", post_data)
# you can find the full list of the response codes here https://docs.dataforseo.com/v3/appendix/errors
if response["status_code"] == 20000:
print(response)
# do something with result
else:
print("error. Code: %d Message: %s" % (response["status_code"], response["status_message"]))
using Newtonsoft.Json;
using System;
using System.Collections.Generic;
using System.Net.Http;
using System.Net.Http.Headers;
using System.Text;
using System.Threading.Tasks;
namespace DataForSeoDemos
{
public static partial class Demos
{
public static async Task backlinks_content_duplicates_live()
{
var httpClient = new HttpClient
{
BaseAddress = new Uri("https://api.dataforseo.com/"),
// Instead of 'login' and 'password' use your credentials from https://app.dataforseo.com/api-dashboard
DefaultRequestHeaders = { Authorization = new AuthenticationHeaderValue("Basic", Convert.ToBase64String(Encoding.ASCII.GetBytes("login:password"))) }
};
var postData = new List<object=>();
postData.Add(new
{
target = "https://dataforseo.com",
similarity = 4,
limit = 10
});
// POST /v3/backlinks/content_duplicates/live
// the full list of possible parameters is available in documentation
var taskPostResponse = await httpClient.PostAsync("/v3/backlinks/content_duplicates/live", new StringContent(JsonConvert.SerializeObject(postData)));
var result = JsonConvert.DeserializeObject<dynamic>(await taskPostResponse.Content.ReadAsStringAsync());
// you can find the full list of the response codes here https://docs.dataforseo.com/v3/appendix/errors
if (result.status_code == 20000)
{
// do something with result
Console.WriteLine(result);
}
else
Console.WriteLine($"error. Code: {result.status_code} Message: {result.status_message}");
}
}
}
The above command returns JSON structured like this:
All POST data should be sent in the JSON format (UTF-8 encoding). The task setting is done using the POST method. When setting a task, you should send all task parameters in the task array of the generic POST array.
Description of the fields for setting a task:
Field name
Type
Description
target
string
page URL required field
example: "https://www.marthastewart.com/2226792/how-bathe-your-cat" Note: you can specify only URLs in this field;
when sending multiple requests simultaneously, the URLs in this field must belong to the same domain to avoid errors
similarity
integer
content similarity score
you can set this score from 0 to 4;
when set to 0, the API will return pages with content not similar (or minimally similar) to the content of the target page;
when set to 4, the API will return pages with content highly similar to the content of the target page;
default value: 2
limit
integer
the maximum number of returned pages
optional field
default value: 100;
maximum value: 1000
offset
integer
offset in the results array of returned pages
optional field
default value: 0;
if you specify the 10 value, the first ten pages in the results array will be omitted and the data will be provided for the successive pages
filters
array
array of results filtering parameters
optional field you can add several filters at once (8 filters maximum);
you should set a logical operator and, or between the conditions;
the following operators are supported: =, <>, in, not_in, like, not_like, ilike, not_ilike
you can use the % operator with like and not_like to match any string of zero or more characters
example: ["meta.internal_links_count",">","1"]
The full list of possible filters is available by this link.
order_by
array
results sorting rules
optional field
you can use the same values as in the filters array to sort the results;
possible sorting types: asc – results will be sorted in ascending order; desc – results will be sorted in descending order;
you should use a comma to set up a sorting type;
example: ["page_spam_score,desc"] note that you can set no more than three sorting rules in a single request;
you should use a comma to separate several sorting rules;
example: ["page_spam_score,desc","words_count,asc"]
tag
string
user-defined task identifier
optional field the character limit is 255
you can use this parameter to identify the task and match it with the result
you will find the specified tag value in the data array of the response
As a response of the API server, you will receive JSON-encoded data containing a tasks array with the information specific to the set tasks.
Description of the fields in the results array:
Field name
Type
Description
version
string
the current version of the API
status_code
integer
general status code
you can find the full list of the response codes here Note: we strongly recommend designing a necessary system for handling related exceptional or error conditions
status_message
string
general informational message
you can find the full list of general informational messages here
time
string
execution time, seconds
cost
float
total tasks cost, USD
tasks_count
integer
the number of tasks in the tasks array
tasks_error
integer
the number of tasks in the tasks array returned with an error
tasks
array
array of tasks
id
string
task identifier unique task identifier in our system in the UUID format
status_code
integer
status code of the task
generated by DataForSEO; can be within the following range: 10000-60000
you can find the full list of the response codes here
status_message
string
informational message of the task
you can find the full list of general informational messages here
time
string
execution time, seconds
cost
float
cost of the task, USD
result_count
integer
number of elements in the result array
path
array
URL path
data
array
contains the same parameters that you specified in the POST request
result
array
array of results
target
string
target in a POST array
similarity
integer
content similarity score from the POST array
total_count
integer
total number of relevant items in the database
items_count
integer
number of items in the items array
items
array
items array
type
string
type of element = ‘backlinks_content_duplicate’
similarity
integer
content similarity score
can take values from 0 to 4
main_domain
string
main website domain
main website domain does not include subdomains
domain
string
domain
domain where the page was found
tld
string
top-level domain
top-level domain in the DNS root zone
page
string
page URL
relevant page’s URL
ip
string
Internet Protocol
first_visited
string
date and time of the first page visit
date and time when our crawler visited this page for the first time;
in the UTC format: “yyyy-mm-dd hh-mm-ss +00:00”
example: 2017-01-24 13:20:59 +00:00
prev_visited
string
previous to the most recent date when our crawler visited the page
in the UTC format: “yyyy-mm-dd hh-mm-ss +00:00”
example: 2017-01-24 13:20:59 +00:00
fetch_time
string
most recent date and time when our crawler visited the page
in the UTC format: “yyyy-mm-dd hh-mm-ss +00:00”
example: 2017-01-24 13:20:59 +00:00
status code
integer
HTTP status code of the page
location
string
location header
indicates the URL to redirect a page to if exists
size
integer
indicates the page size, in bytes
encoded_size
integer
page size after encoding
indicates the size of the encoded page, in bytes
content_encoding
string
type of encoding
media_type
string
types of media used to display the page
server
string
server version
meta
object
page meta data
title
string
page title
canonical
string
canonical page
internal_links_count
integer
number of internal links on the page
external_links_count
integer
number of external links on the page
images_count
integer
number of images on the page
words_count
integer
number of words on the page
page_spam_score
integer
spam score of the page
this metric indicates how spammy the page is, considering various signals;
learn more about how the score is calculated on this help center page
social_media_tags
object
social media tags found on the page
contains social media tags and their content
supported tags include but are not limited to Open Graph and Twitter card